2025-06-04 10:13:15.AIbase.18.6k
Stanford's Latest Evaluation: DeepSeek R1 Medical AI Model Outperforms Google and OpenAI with High Scores
Recently, Stanford University released a comprehensive evaluation of clinical medical AI models. DeepSeek R1 stood out as the champion among nine leading large models, achieving a 66% win rate and a macro average score of 0.75. The highlight of this evaluation is that it not only focuses on traditional medical license exam questions but also delves into the daily work scenarios of clinical doctors, providing more practical assessments. The evaluation team developed an integrated assessment framework called MedHELM, which includes 35 benchmarks covering 22 subtasks in medicine.